21 research outputs found

    Graph4Med: a web application and a graph database for visualizing and analyzing medical databases

    Get PDF
    Background: Medical databases normally contain large amounts of data in a variety of forms. Although they grant significant insights into diagnosis and treatment, implementing data exploration into current medical databases is challenging since these are often based on a relational schema and cannot be used to easily extract information for cohort analysis and visualization. As a consequence, valuable information regarding cohort distribution or patient similarity may be missed. With the rapid advancement of biomedical technologies, new forms of data from methods such as Next Generation Sequencing (NGS) or chromosome microarray (array CGH) are constantly being generated; hence it can be expected that the amount and complexity of medical data will rise and bring relational database systems to a limit. Description: We present Graph4Med, a web application that relies on a graph database obtained by transforming a relational database. Graph4Med provides a straightforward visualization and analysis of a selected patient cohort. Our use case is a database of pediatric Acute Lymphoblastic Leukemia (ALL). Along routine patients’ health records it also contains results of latest technologies such as NGS data. We developed a suitable graph data schema to convert the relational data into a graph data structure and store it in Neo4j. We used NeoDash to build a dashboard for querying and displaying patients’ cohort analysis. This way our tool (1) quickly displays the overview of patients’ cohort information such as distributions of gender, age, mutations (fusions), diagnosis; (2) provides mutation (fusion) based similarity search and display in a maneuverable graph; (3) generates an interactive graph of any selected patient and facilitates the identification of interesting patterns among patients. Conclusion: We demonstrate the feasibility and advantages of a graph database for storing and querying medical databases. Our dashboard allows a fast and interactive analysis and visualization of complex medical data. It is especially useful for patients similarity search based on mutations (fusions), of which vast amounts of data have been generated by NGS in recent years. It can discover relationships and patterns in patients cohorts that are normally hard to grasp. Expanding Graph4Med to more medical databases will bring novel insights into diagnostic and research

    Systematic genetic analysis of pediatric patients with autoinflammatory diseases

    Get PDF
    Monogenic autoinflammatory diseases (AID) encompass a growing group of inborn errors of the innate immune system causing unprovoked or exaggerated systemic inflammation. Diagnosis of monogenic AID requires an accurate description of the patients’ phenotype, and the identification of highly penetrant genetic variants in single genes is pivotal. We performed whole exome sequencing (WES) of 125 pediatric patients with suspected monogenic AID in a routine genetic diagnostic setting. Datasets were analyzed in a step-wise approach to identify the most feasible diagnostic strategy. First, we analyzed a virtual gene panel including 13 genes associated with known AID and, if no genetic diagnosis was established, we then analyzed a virtual panel including 542 genes published by the International Union of Immunological Societies associated including all known inborn error of immunity (IEI). Subsequently, WES data was analyzed without pre-filtering for known AID/IEI genes. Analyzing 13 genes yielded a definite diagnosis in 16.0% (n = 20). The diagnostic yield was increased by analyzing 542 genes to 20.8% (n = 26). Importantly, expanding the analysis to WES data did not increase the diagnostic yield in our cohort, neither in single WES analysis, nor in trio-WES analysis. The study highlights that the cost- and time-saving analysis of virtual gene panels is sufficient to rapidly confirm the differential diagnosis in pediatric patients with AID. WES data or trio-WES data analysis as a first-tier diagnostic analysis in patients with suspected monogenic AID is of limited benefit

    The genomic and transcriptional landscape of primary central nervous system lymphoma

    Get PDF
    Primary lymphomas of the central nervous system (PCNSL) are mainly diffuse large B-cell lymphomas (DLBCLs) confined to the central nervous system (CNS). Molecular drivers of PCNSL have not been fully elucidated. Here, we profile and compare the whole-genome and transcriptome landscape of 51 CNS lymphomas (CNSL) to 39 follicular lymphoma and 36 DLBCL cases outside the CNS. We find recurrent mutations in JAK-STAT, NFkB, and B-cell receptor signaling pathways, including hallmark mutations in MYD88 L265P (67%) and CD79B (63%), and CDKN2A deletions (83%). PCNSLs exhibit significantly more focal deletions of HLA-D (6p21) locus as a potential mechanism of immune evasion. Mutational signatures correlating with DNA replication and mitosis are significantly enriched in PCNSL. TERT gene expression is significantly higher in PCNSL compared to activated B-cell (ABC)-DLBCL. Transcriptome analysis clearly distinguishes PCNSL and systemic DLBCL into distinct molecular subtypes. Epstein-Barr virus (EBV)+ CNSL cases lack recurrent mutational hotspots apart from IG and HLA-DRB loci. We show that PCNSL can be clearly distinguished from DLBCL, having distinct expression profiles, IG expression and translocation patterns, as well as specific combinations of genetic alterations

    The trans-ancestral genomic architecture of glycemic traits

    Get PDF
    Glycemic traits are used to diagnose and monitor type 2 diabetes and cardiometabolic health. To date, most genetic studies of glycemic traits have focused on individuals of European ancestry. Here we aggregated genome-wide association studies comprising up to 281,416 individuals without diabetes (30% non-European ancestry) for whom fasting glucose, 2-h glucose after an oral glucose challenge, glycated hemoglobin and fasting insulin data were available. Trans-ancestry and single-ancestry meta-analyses identified 242 loci (99 novel; P < 5 x 10(-8)), 80% of which had no significant evidence of between-ancestry heterogeneity. Analyses restricted to individuals of European ancestry with equivalent sample size would have led to 24 fewer new loci. Compared with single-ancestry analyses, equivalent-sized trans-ancestry fine-mapping reduced the number of estimated variants in 99% credible sets by a median of 37.5%. Genomic-feature, gene-expression and gene-set analyses revealed distinct biological signatures for each trait, highlighting different underlying biological pathways. Our results increase our understanding of diabetes pathophysiology by using trans-ancestry studies for improved power and resolution. A trans-ancestry meta-analysis of GWAS of glycemic traits in up to 281,416 individuals identifies 99 novel loci, of which one quarter was found due to the multi-ancestry approach, which also improves fine-mapping of credible variant sets.Peer reviewe

    TRY plant trait database – enhanced coverage and open access

    Get PDF
    Plant traits - the morphological, anatomical, physiological, biochemical and phenological characteristics of plants - determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait‐based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits - almost complete coverage for ‘plant growth form’. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait–environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives

    TRY plant trait database – enhanced coverage and open access

    Get PDF
    Plant traits—the morphological, anatomical, physiological, biochemical and phenological characteristics of plants—determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait-based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits—almost complete coverage for ‘plant growth form’. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait–environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives.Rest of authors: Decky Junaedi, Robert R. Junker, Eric Justes, Richard Kabzems, Jeffrey Kane, Zdenek Kaplan, Teja Kattenborn, Lyudmila Kavelenova, Elizabeth Kearsley, Anne Kempel, Tanaka Kenzo, Andrew Kerkhoff, Mohammed I. Khalil, Nicole L. Kinlock, Wilm Daniel Kissling, Kaoru Kitajima, Thomas Kitzberger, Rasmus KjĂžller, Tamir Klein, Michael Kleyer, Jitka KlimeĆĄovĂĄ, Joice Klipel, Brian Kloeppel, Stefan Klotz, Johannes M. H. Knops, Takashi Kohyama, Fumito Koike, Johannes Kollmann, Benjamin Komac, Kimberly Komatsu, Christian König, Nathan J. B. Kraft, Koen Kramer, Holger Kreft, Ingolf KĂŒhn, Dushan Kumarathunge, Jonas Kuppler, Hiroko Kurokawa, Yoko Kurosawa, Shem Kuyah, Jean-Paul Laclau, Benoit Lafleur, Erik Lallai, Eric Lamb, Andrea Lamprecht, Daniel J. Larkin, Daniel Laughlin, Yoann Le Bagousse-Pinguet, Guerric le Maire, Peter C. le Roux, Elizabeth le Roux, Tali Lee, Frederic Lens, Simon L. Lewis, Barbara Lhotsky, Yuanzhi Li, Xine Li, Jeremy W. Lichstein, Mario Liebergesell, Jun Ying Lim, Yan-Shih Lin, Juan Carlos Linares, Chunjiang Liu, Daijun Liu, Udayangani Liu, Stuart Livingstone, Joan LlusiĂ , Madelon Lohbeck, Álvaro LĂłpez-GarcĂ­a, Gabriela Lopez-Gonzalez, Zdeƈka LososovĂĄ, FrĂ©dĂ©rique Louault, BalĂĄzs A. LukĂĄcs, Petr LukeĆĄ, Yunjian Luo, Michele Lussu, Siyan Ma, Camilla Maciel Rabelo Pereira, Michelle Mack, Vincent Maire, Annikki MĂ€kelĂ€, Harri MĂ€kinen, Ana Claudia Mendes Malhado, Azim Mallik, Peter Manning, Stefano Manzoni, Zuleica Marchetti, Luca Marchino, Vinicius Marcilio-Silva, Eric Marcon, Michela Marignani, Lars Markesteijn, Adam Martin, Cristina MartĂ­nez-Garza, Jordi MartĂ­nez-Vilalta, Tereza MaĆĄkovĂĄ, Kelly Mason, Norman Mason, Tara Joy Massad, Jacynthe Masse, Itay Mayrose, James McCarthy, M. Luke McCormack, Katherine McCulloh, Ian R. McFadden, Brian J. McGill, Mara Y. McPartland, Juliana S. Medeiros, Belinda Medlyn, Pierre Meerts, Zia Mehrabi, Patrick Meir, Felipe P. L. Melo, Maurizio Mencuccini, CĂ©line Meredieu, Julie Messier, Ilona MĂ©szĂĄros, Juha Metsaranta, Sean T. Michaletz, Chrysanthi Michelaki, Svetlana Migalina, Ruben Milla, Jesse E. D. Miller, Vanessa Minden, Ray Ming, Karel Mokany, Angela T. Moles, Attila MolnĂĄr V, Jane Molofsky, Martin Molz, Rebecca A. Montgomery, Arnaud Monty, Lenka MoravcovĂĄ, Alvaro Moreno-MartĂ­nez, Marco Moretti, Akira S. Mori, Shigeta Mori, Dave Morris, Jane Morrison, Ladislav Mucina, Sandra Mueller, Christopher D. Muir, Sandra Cristina MĂŒller, François Munoz, Isla H. Myers-Smith, Randall W. Myster, Masahiro Nagano, Shawna Naidu, Ayyappan Narayanan, Balachandran Natesan, Luka Negoita, Andrew S. Nelson, Eike Lena Neuschulz, Jian Ni, Georg Niedrist, Jhon Nieto, Ülo Niinemets, Rachael Nolan, Henning Nottebrock, Yann Nouvellon, Alexander Novakovskiy, The Nutrient Network, Kristin Odden Nystuen, Anthony O'Grady, Kevin O'Hara, Andrew O'Reilly-Nugent, Simon Oakley, Walter Oberhuber, Toshiyuki Ohtsuka, Ricardo Oliveira, Kinga Öllerer, Mark E. Olson, Vladimir Onipchenko, Yusuke Onoda, Renske E. Onstein, Jenny C. Ordonez, Noriyuki Osada, Ivika Ostonen, Gianluigi Ottaviani, Sarah Otto, Gerhard E. Overbeck, Wim A. Ozinga, Anna T. Pahl, C. E. Timothy Paine, Robin J. Pakeman, Aristotelis C. Papageorgiou, Evgeniya Parfionova, Meelis PĂ€rtel, Marco Patacca, Susana Paula, Juraj Paule, Harald Pauli, Juli G. Pausas, Begoña Peco, Josep Penuelas, Antonio Perea, Pablo Luis Peri, Ana Carolina Petisco-Souza, Alessandro Petraglia, Any Mary Petritan, Oliver L. Phillips, Simon Pierce, ValĂ©rio D. Pillar, Jan Pisek, Alexandr Pomogaybin, Hendrik Poorter, Angelika Portsmuth, Peter Poschlod, Catherine Potvin, Devon Pounds, A. Shafer Powell, Sally A. Power, Andreas Prinzing, Giacomo Puglielli, Petr PyĆĄek, Valerie Raevel, Anja Rammig, Johannes Ransijn, Courtenay A. Ray, Peter B. Reich, Markus Reichstein, Douglas E. B. Reid, Maxime RĂ©jou-MĂ©chain, Victor Resco de Dios, Sabina Ribeiro, Sarah Richardson, Kersti Riibak, Matthias C. Rillig, Fiamma Riviera, Elisabeth M. R. Robert, Scott Roberts, Bjorn Robroek, Adam Roddy, Arthur Vinicius Rodrigues, Alistair Rogers, Emily Rollinson, Victor Rolo, Christine Römermann, Dina Ronzhina, Christiane Roscher, Julieta A. Rosell, Milena Fermina Rosenfield, Christian Rossi, David B. Roy, Samuel Royer-Tardif, Nadja RĂŒger, Ricardo Ruiz-Peinado, Sabine B. Rumpf, Graciela M. Rusch, Masahiro Ryo, Lawren Sack, Angela Saldaña, Beatriz Salgado-Negret, Roberto Salguero-Gomez, Ignacio Santa-Regina, Ana Carolina Santacruz-GarcĂ­a, Joaquim Santos, Jordi Sardans, Brandon Schamp, Michael Scherer-Lorenzen, Matthias Schleuning, Bernhard Schmid, Marco Schmidt, Sylvain Schmitt, Julio V. Schneider, Simon D. Schowanek, Julian Schrader, Franziska Schrodt, Bernhard Schuldt, Frank Schurr, Galia Selaya Garvizu, Marina Semchenko, Colleen Seymour, Julia C. Sfair, Joanne M. Sharpe, Christine S. Sheppard, Serge Sheremetiev, Satomi Shiodera, Bill Shipley, Tanvir Ahmed Shovon, Alrun SiebenkĂ€s, Carlos Sierra, Vasco Silva, Mateus Silva, Tommaso Sitzia, Henrik Sjöman, Martijn Slot, Nicholas G. Smith, Darwin Sodhi, Pamela Soltis, Douglas Soltis, Ben Somers, GrĂ©gory Sonnier, Mia Vedel SĂžrensen, Enio Egon Sosinski Jr, Nadejda A. Soudzilovskaia, Alexandre F. Souza, Marko Spasojevic, Marta Gaia Sperandii, Amanda B. Stan, James Stegen, Klaus Steinbauer, Jörg G. Stephan, Frank Sterck, Dejan B. Stojanovic, Tanya Strydom, Maria Laura Suarez, Jens-Christian Svenning, Ivana SvitkovĂĄ, Marek Svitok, Miroslav Svoboda, Emily Swaine, Nathan Swenson, Marcelo Tabarelli, Kentaro Takagi, Ulrike Tappeiner, RubĂ©n Tarifa, Simon Tauugourdeau, Cagatay Tavsanoglu, Mariska te Beest, Leho Tedersoo, Nelson Thiffault, Dominik Thom, Evert Thomas, Ken Thompson, Peter E. Thornton, Wilfried Thuiller, LubomĂ­r TichĂœ, David Tissue, Mark G. Tjoelker, David Yue Phin Tng, Joseph Tobias, PĂ©ter Török, Tonantzin Tarin, JosĂ© M. Torres-Ruiz, BĂ©la TĂłthmĂ©rĂ©sz, Martina Treurnicht, Valeria Trivellone, Franck Trolliet, Volodymyr Trotsiuk, James L. Tsakalos, Ioannis Tsiripidis, Niklas Tysklind, Toru Umehara, Vladimir Usoltsev, Matthew Vadeboncoeur, Jamil Vaezi, Fernando Valladares, Jana Vamosi, Peter M. van Bodegom, Michiel van Breugel, Elisa Van Cleemput, Martine van de Weg, Stephni van der Merwe, Fons van der Plas, Masha T. van der Sande, Mark van Kleunen, Koenraad Van Meerbeek, Mark Vanderwel, Kim AndrĂ© Vanselow, Angelica VĂ„rhammar, Laura Varone, Maribel Yesenia Vasquez Valderrama, Kiril Vassilev, Mark Vellend, Erik J. Veneklaas, Hans Verbeeck, Kris Verheyen, Alexander Vibrans, Ima Vieira, Jaime VillacĂ­s, Cyrille Violle, Pandi Vivek, Katrin Wagner, Matthew Waldram, Anthony Waldron, Anthony P. Walker, Martyn Waller, Gabriel Walther, Han Wang, Feng Wang, Weiqi Wang, Harry Watkins, James Watkins, Ulrich Weber, James T. Weedon, Liping Wei, Patrick Weigelt, Evan Weiher, Aidan W. Wells, Camilla Wellstein, Elizabeth Wenk, Mark Westoby, Alana Westwood, Philip John White, Mark Whitten, Mathew Williams, Daniel E. Winkler, Klaus Winter, Chevonne Womack, Ian J. Wright, S. Joseph Wright, Justin Wright, Bruno X. Pinho, Fabiano Ximenes, Toshihiro Yamada, Keiko Yamaji, Ruth Yanai, Nikolay Yankov, Benjamin Yguel, KĂĄtia Janaina Zanini, Amy E. Zanne, David ZelenĂœ, Yun-Peng Zhao, Jingming Zheng, Ji Zheng, Kasia ZiemiƄska, Chad R. Zirbel, Georg Zizka, IriĂ© Casimir Zo-Bi, Gerhard Zotz, Christian Wirth.Max Planck Institute for Biogeochemistry; Max Planck Society; German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig; International Programme of Biodiversity Science (DIVERSITAS); International Geosphere-Biosphere Programme (IGBP); Future Earth; French Foundation for Biodiversity Research (FRB); GIS ‘Climat, Environnement et SociĂ©tĂ©'.http://wileyonlinelibrary.com/journal/gcbhj2021Plant Production and Soil Scienc

    The Clinical Utility of Optical Genome Mapping for the Assessment of Genomic Aberrations in Acute Lymphoblastic Leukemia

    No full text
    Acute lymphoblastic leukemia (ALL) is the most prevalent type of cancer occurring in children. ALL is characterized by structural and numeric genomic aberrations that strongly correlate with prognosis and clinical outcome. Usually, a combination of cyto- and molecular genetic methods (karyotyping, array-CGH, FISH, RT-PCR, RNA-Seq) is needed to identify all aberrations relevant for risk stratification. We investigated the feasibility of optical genome mapping (OGM), a DNA-based method, to detect these aberrations in an all-in-one approach. As proof of principle, twelve pediatric ALL samples were analyzed by OGM, and results were validated by comparing OGM data to results obtained from routine diagnostics. All genomic aberrations including translocations (e.g., dic(9;12)), aneuploidies (e.g., high hyperdiploidy) and copy number variations (e.g., IKZF1, PAX5) known from other techniques were also detected by OGM. Moreover, OGM was superior to well-established techniques for resolution of the more complex structure of a translocation t(12;21) and had a higher sensitivity for detection of copy number alterations. Importantly, a new and unknown gene fusion of JAK2 and NPAT due to a translocation t(9;11) was detected. We demonstrate the feasibility of OGM to detect well-established as well as new putative prognostic markers in an all-in-one approach in ALL. We hope that these limited results will be confirmed with testing of more samples in the future

    Increased prostaglandin-D2 in male STAT3-deficient hearts shifts cardiac progenitor cells from endothelial to white adipocyte differentiation

    No full text
    Cardiac levels of the signal transducer and activator of transcription factor-3 (STAT3) decline with age, and male but not female mice with a cardiomyocyte-specific STAT3 deficiency conditional knockout (CKO) display premature age-related heart failure associated with reduced cardiac capillary density. In the present study, isolated male and female CKOcardiomyocytes exhibit increased prostaglandin (PG)-generating cyclooxygenase-2 (COX- 2) expression. The PG-degrading hydroxyprostaglandin-dehydrogenase-15 (HPGD) expression is only reduced in male cardiomyocytes, which is associated with increased prostaglandin D2 (PGD2) secretion from isolated male but not female CKO-cardiomyocytes. Reduced HPGD expression in male cardiomyocytes derive from impaired androgen receptor (AR)–signaling due to loss of its cofactor STAT3. Elevated PGD2 secretion in males is associated with increased white adipocyte accumulation in aged male but not female hearts. Adipocyte differentiation is enhanced in isolated stem cell antigen-1 (SCA-1)+ cardiac progenitor cells (CPC) from young male CKO-mice compared with the adipocyte differentiation of male wild-type (WT)-CPC and CPC isolated from female mice. Epigenetic analysis in freshly isolated male CKO-CPC display hypermethylation in pro-angiogenic genes (Fgfr2, Epas1) and hypomethylation in the white adipocyte differentiation gene Zfp423 associated with up-regulated ZFP423 expression and a shift from endothelial to white adipocyte differentiation compared with WT-CPC. The expression of the histone-methyltransferase EZH2 is reduced in male CKO-CPC compared with male WT-CPC, whereas no differences in the EZH2 expression in female CPC were observed. Clonally expanded CPC can differentiate into endothelial cells or into adipocytes depending on the differentiation conditions. ZFP423 overexpression is sufficient to induce white adipocyte differentiation of clonal CPC. In isolated WT-CPC, PGD2 stimulation reduces the expression of EZH2, thereby up-regulating ZFP423 expression and promoting white adipocyte differentiation. The treatment of young male CKO mice with the COX inhibitor Ibuprofen or the PGD2 receptor (DP)2 receptor antagonist BAY-u 3405 in vivo increased EZH2 expression and reduced ZFP423 expression and adipocyte differentiation in CKO-CPC. Thus, cardiomyocyte STAT3 deficiency leads to age-related and sex-specific cardiac remodeling and failure in part due to sex-specific alterations in PGD2 secretion and subsequent epigenetic impairment of the differentiation potential of CPC. Causally involved is the impaired AR signaling in absence of STAT3, which reduces the expression of the PG-degrading enzyme HPGD

    Implementation of RNA sequencing and array CGH in the diagnostic workflow of the AIEOP-BFM ALL 2017 trial on acute lymphoblastic leukemia

    No full text
    Risk-adapted therapy has significantly contributed to improved survival rates in pediatric acute lymphoblastic leukemia (ALL) and reliable detection of chromosomal aberrations is mandatory for risk group stratification. This study evaluated the applicability of panel-based RNA sequencing and array CGH within the diagnostic workflow of the German study group of the international AIEOP-BFM ALL 2017 trial. In a consecutive cohort of 117 children with B cell precursor (BCP) ALL, array analysis identified twelve cases with an IKZF

    The Gene Expression Classifier ALLCatchR Identifies B-cell Precursor ALL Subtypes and Underlying Developmental Trajectories Across Age

    No full text
    Current classifications (World Health Organization-HAEM5/ICC) define up to 26 molecular B-cell precursor acute lymphoblastic leukemia (BCP-ALL) disease subtypes by genomic driver aberrations and corresponding gene expression signatures. Identification of driver aberrations by transcriptome sequencing (RNA-Seq) is well established, while systematic approaches for gene expression analysis are less advanced. Therefore, we developed ALLCatchR, a machine learning-based classifier using RNA-Seq gene expression data to allocate BCP-ALL samples to all 21 gene expression-defined molecular subtypes. Trained on n = 1869 transcriptome profiles with established subtype definitions (4 cohorts; 55% pediatric / 45% adult), ALLCatchR allowed subtype allocation in 3 independent hold-out cohorts (n = 1018; 75% pediatric / 25% adult) with 95.7% accuracy (averaged sensitivity across subtypes: 91.1% / specificity: 99.8%). High-confidence predictions were achieved in 83.7% of samples with 98.9% accuracy. Only 1.2% of samples remained unclassified. ALLCatchR outperformed existing tools and identified novel driver candidates in previously unassigned samples. Additional modules provided predictions of samples blast counts, patient’s sex, and immunophenotype, allowing the imputation in cases where these information are missing. We established a novel RNA-Seq reference of human B-lymphopoiesis using 7 FACS-sorted progenitor stages from healthy bone marrow donors. Implementation in ALLCatchR enabled projection of BCP-ALL samples to this trajectory. This identified shared proximity patterns of BCP-ALL subtypes to normal lymphopoiesis stages, extending immunophenotypic classifications with a novel framework for developmental comparisons of BCP-ALL. ALLCatchR enables RNA-Seq routine application for BCP-ALL diagnostics with systematic gene expression analysis for accurate subtype allocation and novel insights into underlying developmental trajectories
    corecore